# Dynamic Resolution Processing

Internvl3 38B Instruct GGUF
Apache-2.0
InternVL3-38B-Instruct is an advanced Multimodal Large Language Model (MLLM) that demonstrates exceptional overall performance, with strong multimodal perception and reasoning capabilities.
Image-to-Text Transformers
I
unsloth
1,236
2
Biqwen2 V0.1
Apache-2.0
BiQwen2 is a visual retrieval model based on Qwen2-VL-2B-Instruct and the ColBERT strategy, focusing on efficient visual document retrieval.
Text-to-Image Safetensors English
B
vidore
460
0
Qwen2.5 VL Instruct 3B Geo
Apache-2.0
Qwen2.5-VL is the latest vision-language model in the Qwen family, focusing on enhanced visual understanding and agent capabilities.
Text-to-Image Transformers English
Q
kxxinDave
29
2
Colqwen2.5 3b Multilingual V1.0 Merged
MIT
A multilingual visual retrieval model based on Qwen2.5-VL-3B-Instruct and ColBERT strategy, supporting dynamic input image resolution and generating ColBERT-style multi-vector text and image representations.
Text-to-Image Transformers Supports Multiple Languages
C
tsystems
70
0
Qwen2.5 VL 72B Instruct AWQ Fix
Other
Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring powerful visual understanding and agent capabilities, supporting multi-format visual localization and structured output generation.
Image-to-Text Transformers English
Q
Benasd
94
1
Colqwen2.5 7b Multilingual V1.0
MIT
A multilingual visual retrieval model based on Qwen2.5-VL-7B-Instruct using the ColBERT strategy, ranked first in the Vidore benchmark
Text-to-Image Supports Multiple Languages
C
Metric-AI
4,699
7
Colqwen2.5 3b Multilingual V1.0
MIT
A multilingual visual retriever based on Qwen2.5-VL-3B-Instruct with ColBERT strategy, excelling in Vidore benchmark tests
Text-to-Image Supports Multiple Languages
C
Metric-AI
2,475
7
Qwen2.5 VL 72B Instruct Pointer AWQ
Other
Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring enhanced visual understanding, agent capabilities, and structured output generation.
Image-to-Text Transformers English
Q
PointerHQ
5,592
8
Uground V1 2B
Apache-2.0
UGround is a powerful GUI visual positioning model trained using a simple method, jointly developed by OSUNLP and Orby AI.
Multimodal Fusion Transformers English
U
osunlp
975
8
Colqwen2 V1.0
Apache-2.0
ColQwen2 is a visual retrieval model based on Qwen2-VL-2B-Instruct and the ColBERT strategy, designed for efficient indexing of document visual features.
Text-to-Image Safetensors English
C
vidore
106.85k
86
Colqwen2 V0.1
Apache-2.0
A visual retrieval model based on Qwen2-VL-2B-Instruct and ColBERT strategy, capable of efficiently indexing documents through visual features
Text-to-Image English
C
vidore
21.25k
170
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase